Skip to content

[Dataflow Streaming] Activate SourceState Finalizers before submitting workitem to harness threads#38921

Merged
scwhittle merged 3 commits into
apache:masterfrom
arunpandianp:finalize
Jun 12, 2026
Merged

[Dataflow Streaming] Activate SourceState Finalizers before submitting workitem to harness threads#38921
scwhittle merged 3 commits into
apache:masterfrom
arunpandianp:finalize

Conversation

@arunpandianp

Copy link
Copy Markdown
Contributor

Adding a filter for failed items in #38920. Making this change so finalizers from failed items get scheduled even if the work item gets cancelled.

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request improves the robustness of Dataflow streaming work scheduling by ensuring that finalizers associated with work items are processed immediately upon scheduling. By moving the finalization logic into the scheduling flow, the system guarantees that finalizers for failed items are executed even if the corresponding work item is subsequently cancelled, preventing potential resource leaks or missed cleanup tasks.

Highlights

  • Finalizer Lifecycle Management: Refactored the scheduling logic to ensure SourceState finalizers are activated before work items are submitted to harness threads, improving reliability for failed items.
  • API Consolidation: Removed the explicit queueAppliedFinalizeIds method in favor of passing appliedFinalizeIds directly into the scheduleWork method, centralizing the finalization trigger.
New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@arunpandianp

Copy link
Copy Markdown
Contributor Author

R: @scwhittle

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the handling of applied finalize IDs in the streaming worker. Instead of queueing them separately via queueAppliedFinalizeIds, they are now passed directly to StreamingWorkScheduler.scheduleWork and finalized alongside the work item's source state finalize IDs before work is scheduled. A review comment suggests adding empty checks before calling commitFinalizer.finalizeCommits to avoid unnecessary overhead and potential lock contention on the critical path when there are no finalize IDs.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@github-actions

Copy link
Copy Markdown
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

@arunpandianp arunpandianp marked this pull request as draft June 11, 2026 13:57
@arunpandianp arunpandianp marked this pull request as ready for review June 11, 2026 14:03

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request moves the execution of pending OnCommit callbacks (commitFinalizer.finalizeCommits) from the processWork method to the scheduleWork method in StreamingWorkScheduler.java. The reviewer raised a high-severity concern that executing these finalizers synchronously on the scheduling/dispatching thread can block the main dispatching loop, potentially leading to severe performance bottlenecks or thread starvation if the finalizers perform blocking I/O or run arbitrary user code. It is recommended to offload this execution asynchronously or handle it within the work item's lifecycle.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@scwhittle scwhittle merged commit 705db25 into apache:master Jun 12, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants